theoretical property
- South America > Brazil (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (2 more...)
On the Theoretical Properties of Noise Correlation in Stochastic Optimization
Studying the properties of stochastic noise to optimize complex non-convex functions has been an active area of research in the field of machine learning. Prior work~\citep{zhou2019pgd, wei2019noise} has shown that the noise of stochastic gradient descent improves optimization by overcoming undesirable obstacles in the landscape. Moreover, injecting artificial Gaussian noise has become a popular idea to quickly escape saddle points. Indeed, in the absence of reliable gradient information, the noise is used to explore the landscape, but it is unclear what type of noise is optimal in terms of exploration ability. In order to narrow this gap in our knowledge, we study a general type of continuous-time non-Markovian process, based on fractional Brownian motion, that allows for the increments of the process to be correlated. This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process. We demonstrate how to discretize such processes which gives rise to the new algorithm ``fPGD''.
e836d813fd184325132fca8edcdfb40e-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Overview: This paper looks at the difficult problem of learning FST models of unaligned input and output sequences. This is an interesting problem and the approach appears to have merit; the main drawback of the paper is that some sections are very difficult to understand. In the abstract and introduction: the authors mention that the setting where sequences are not aligned is more realistic. I believe the authors, but examples of problems where unaligned sequences are the norm would be welcome.
Convex Regression with a Penalty
A common way to estimate an unknown convex regression function $f_0: Ω\subset \mathbb{R}^d \rightarrow \mathbb{R}$ from a set of $n$ noisy observations is to fit a convex function that minimizes the sum of squared errors. However, this estimator is known for its tendency to overfit near the boundary of $Ω$, posing significant challenges in real-world applications. In this paper, we introduce a new estimator of $f_0$ that avoids this overfitting by minimizing a penalty on the subgradient while enforcing an upper bound $s_n$ on the sum of squared errors. The key advantage of this method is that $s_n$ can be directly estimated from the data. We establish the uniform almost sure consistency of the proposed estimator and its subgradient over $Ω$ as $n \rightarrow \infty$ and derive convergence rates. The effectiveness of our estimator is illustrated through its application to estimating waiting times in a single-server queue.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > Nassau County > Garden City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (3 more...)
On the Theoretical Properties of Noise Correlation in Stochastic Optimization
Studying the properties of stochastic noise to optimize complex non-convex functions has been an active area of research in the field of machine learning. Prior work \citep{zhou2019pgd, wei2019noise} has shown that the noise of stochastic gradient descent improves optimization by overcoming undesirable obstacles in the landscape. Moreover, injecting artificial Gaussian noise has become a popular idea to quickly escape saddle points. Indeed, in the absence of reliable gradient information, the noise is used to explore the landscape, but it is unclear what type of noise is optimal in terms of exploration ability. In order to narrow this gap in our knowledge, we study a general type of continuous-time non-Markovian process, based on fractional Brownian motion, that allows for the increments of the process to be correlated.
On the Theoretical Properties of Noise Correlation in Stochastic Optimization
Studying the properties of stochastic noise to optimize complex non-convex functions has been an active area of research in the field of machine learning. Prior work \citep{zhou2019pgd, wei2019noise} has shown that the noise of stochastic gradient descent improves optimization by overcoming undesirable obstacles in the landscape. Moreover, injecting artificial Gaussian noise has become a popular idea to quickly escape saddle points. Indeed, in the absence of reliable gradient information, the noise is used to explore the landscape, but it is unclear what type of noise is optimal in terms of exploration ability. In order to narrow this gap in our knowledge, we study a general type of continuous-time non-Markovian process, based on fractional Brownian motion, that allows for the increments of the process to be correlated.
A Ring-Based Distributed Algorithm for Learning High-Dimensional Bayesian Networks
Laborda, Jorge D., Torrijos, Pablo, Puerta, José M., Gámez, José A.
Learning Bayesian Networks (BNs) from high-dimensional data is a complex and time-consuming task. Although there are approaches based on horizontal (instances) or vertical (variables) partitioning in the literature, none can guarantee the same theoretical properties as the Greedy Equivalence Search (GES) algorithm, except those based on the GES algorithm itself. In this paper, we propose a directed ring-based distributed method that uses GES as the local learning algorithm, ensuring the same theoretical properties as GES but requiring less CPU time. The method involves partitioning the set of possible edges and constraining each processor in the ring to work only with its received subset. The global learning process is an iterative algorithm that carries out several rounds until a convergence criterion is met. In each round, each processor receives a BN from its predecessor in the ring, fuses it with its own BN model, and uses the result as the starting solution for a local learning process constrained to its set of edges. Subsequently, it sends the model obtained to its successor in the ring. Experiments were carried out on three large domains (400-1000 variables), demonstrating our proposal's effectiveness compared to GES and its fast version (fGES).
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Spain > Castilla-La Mancha (0.05)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New York (0.04)
Concentration inequalities and optimal number of layers for stochastic deep neural networks
Caprio, Michele, Mukherjee, Sayan
We state concentration inequalities for the output of the hidden layers of a stochastic deep neural network (SDNN), as well as for the output of the whole SDNN. These results allow us to introduce an expected classifier (EC), and to give probabilistic upper bound for the classification error of the EC. We also state the optimal number of layers for the SDNN via an optimal stopping procedure. We apply our analysis to a stochastic version of a feedforward neural network with ReLU activation function.
- Europe > Germany > Saxony > Leipzig (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (7 more...)